Incorporating Latent Semantic Indexing into Spectral Graph Transducer for Text Classification

نویسندگان

  • Xinyu Dai
  • Baoming Tian
  • Junsheng Zhou
  • Jiajun Chen
چکیده

Spectral Graph Transducer(SGT) is one of the superior graph-based transductive learning methods for classification. As for the Spectral Graph Transducer algorithm, a good graph representation for data to be processed is very important. In this paper, we try to incorporate Latent Semantic Indexing(LSI) into SGT for text classification. Firstly, we exploit LSI to represent documents as vectors in a latent semantic space since we propose that the documents and their semantic relationships can be reflected more pertinently in this latent semantic space. Then, a graph needed by SGT is constructed. In the graph, a node corresponds to a vector from LSI. Finally, we apply the graph to Spectral Graph Transducer for text classification. The experiments gave us excellent results on both English and Chinese text classification datasets and demonstrated the validation of our assumption.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Background Knowledge into Nearest-Neighbor Text Classification

This paper describes two different approaches for incorporating background knowledgeinto nearest-neighbor text classification.Our first approachuses backgroundtext to assessthe similarity betweentraining and test documentsrather than assessing their similarity directly. The second method redescribes examples using Latent Semantic Indexing on the background knowledge, assessing document similari...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

A Latent Semantic Structure Model for Text Classification

Latent Semantic Indexing (LSI) has been successfully applied to information retrieval and classification. LSI can deal with the problems of polysemy and synonymy, and can reduce noise in the raw document-term matrix. However, LSI may ignore important features for some small categories because they are not the most important features for all the document collection. In this paper, we describe a ...

متن کامل

Latent Semantic Indexing for Patent Documents

Since the huge database of patent documents is continuously increasing, the issue of classifying, updating and retrieving patent documents turned into an acute necessity. Therefore, we investigate the efficiency of applying Latent Semantic Indexing, an automatic indexing method of information retrieval, to some classes of patent documents from the United States Patent Classification System. We ...

متن کامل

Classification and clustering methods for documents by probabilistic latent semantic indexing model

Based on information retrieval model especially probabilistic latent semantic indexing (PLSI) model, we discuss methods for classification and clustering of a set of documents. A method for classification is presented and is demonstrated its good performance by applying to a set of benchmark documents with free format (text only). Then the classification method is modified to a clustering metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008